Method and system of database divisional management for parallel database system

ABSTRACT

A method and a system of database divisional management for use with a parallel database system comprising an FES (front end server), BES&#39;s (back end servers), an IOS (I/O server) and disk units. The numbers of processors assigned to the FES, BES&#39;s and IOS, the number of disk units, and the number of partitions of the disk units are determined in accordance with the load pattern in question. Illustratively, there may be established a configuration of one FES, four BES&#39;s, one IOS and eight disk units. The number of BES&#39;s is varied from one to four depending on the fluctuation in load, so that a scalable system configuration is implemented. When the number of BES&#39;s is increased or decreased, only the management information thereabout is transferred between nodes and not the data, whereby the desired degree of parallelism is obtained for high-speed query processing.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of application Ser. No.09/665,448, filed Sep. 19, 2000; which is a continuation of applicationSer. No. 09/429,398, filed Oct. 28, 1999, now U.S. Pat. No. 6,192,359;which is a continuation of application Ser. No. 09/153,612, filed onSep. 15, 1998, now U.S. Pat. No. 6,101,495; which is a continuationapplication of application Ser. No. 08/341,953, filed on Nov. 16, 1994,now U.S. Pat. No. 5,813,005, the entire disclosures of which areincorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a method and a system of databasedivisional management for use with a parallel database system. Moreparticularly the invention relates to a database divisional managementmethod and a parallel database system whereby the number of processorsor the number of disk units for database processing is optimized undergiven loads.

2. Description of the Prior Art

Parallel database systems are proposed illustratively by David Dewittand Jim Gray in “Parallel Database Systems: The Future of HighPerformance Database Systems” (CACM, Vol. 35, No. 6, 1992, pp. 85-98).Parallel database systems of the kind proposed above involve having aplurality of processors tightly or loosely connected to one another andsubjected to database divisional management.

How to configure a conventional parallel database system is at theuser's discretion. Once established, the conventional systemconfiguration must remain as it is. With its lack of flexibility, thesystem configuration may be unsuitable from the start for dealing withenvisaged loads or may become incapable of addressing the loads sometime after the start of its operative state. In such cases, the desireddegree of parallel processing is not available and/or high-speed queryprocessing is not implemented.

SUMMARY OF THE INVENTION

It is an object of the present invention to provide a databasedivisional management method and a parallel database system wherebydesired degrees of parallel processing and high-speed query processingare available.

In carrying out the invention and according to a first aspect thereof,there is provided a database divisional management system for use with aparallel database system having a storage medium, storage and managementmeans for storing and managing a database in the storage medium, and aplurality of access means for accessing the database in response toquery inputs. This database divisional management system comprises:generation means for generating a procedure for processing the queryinputs; division means for dividing the database into a plurality ofpartitions in accordance with the load pattern provided for executingthe generated processing procedure; and determination means fordetermining the number of access means for simultaneously accessing thepartitions of the database.

In a preferred structure according to the invention, the storage andmanagement means determines the physical addresses corresponding tological addresses at which the plurality of access means access thepartitions of the database.

With this structure, the load pattern is determined by the accessefficiency of each of the access means and by the amount of informationstored in the partitions of the database accessed by the access means.

According to a second aspect of the invention, there is provided adatabase divisional management method for use with a parallel databasesystem comprising an FES node for analyzing and optimizing user queriesand generating a processing procedure in response thereto, BES nodes foraccessing a database on the basis of the processing procedure generatedby the FES node, an IOS node having a storage medium (i.e., disk units)and capable of storing and managing the database in the storage medium,and a network for connecting the FES, BES and IOS nodes. This databasedivisional management method comprises the steps of: calculating theload pattern by which to perform database processing using theprocessing procedure; and determining the number of processors assignedto the FES node, the number of processors assigned to the BES nodes, thenumber of processors assigned to the IOS node, the number of disk unitsof the IOS node, and the number of partitions of the disk units inaccordance with the load pattern for data processing.

According to a third aspect of the invention, there is provided adatabase divisional management method for use with a parallel databasesystem comprising an FES node for analyzing and optimizing user queriesand generating a processing procedure in response thereto, BES nodeshaving a storage-medium (i.e., disk units) in which to store a databaseand capable of accessing the database on the basis of the processingprocedure generated by the FES node, and a network for connecting theFES and BES nodes. This database divisional management method comprisesthe steps of: calculating the load pattern by which to perform databaseprocessing using the processing procedure; and determining the number ofprocessors assigned to the FES node, the number of processors assignedto the BES nodes, the number of disk units of the BES nodes, and thenumber of partitions of the disk units in accordance with the loadpattern for database processing.

According to a fourth aspect of the invention, there is provided adatabase divisional management method for use with a parallel databasesystem comprising an FES node for analyzing and optimizing user queriesand generating a processing procedure in response thereto, BES nodes foraccessing a database on the basis of the processing procedure generatedby the FES node, an IOS node having a storage medium (i.e., disk units)and capable of storing and managing the database in the storage medium,and a network for connecting the FES, BES and IOS nodes. This databasedivisional management method comprises the steps of: determining theupper limit number of pages which are accessible in parallel and whichrequire a constant time each when the database is scanned for accessthereto; and determining the number of processors assigned to the FESnode, the number of processors assigned to the BES nodes, the number ofprocessors assigned to the IOS node, the number of disk units of the IOSnode, and the number of partitions of the disk units in accordance withthe upper limit number of pages.

According to a fifth aspect of the invention, there is provided adatabase divisional management method for use with a parallel databasesystem comprising an FES node for analyzing and optimizing user queriesand generating a processing procedure in response thereto, BES nodeshaving a storage medium (i.e., disk units) in which to store and managea database and capable of accessing the database on the basis of theprocessing procedure generated by the FES node, and a network forconnecting the FES and BES nodes. This database divisional managementmethod comprises the steps of: determining the upper limit number ofpages which are accessible in parallel and which require a constant timeeach when the database is scanned for access thereto; and determiningthe number of processors assigned to the FES node, the number ofprocessors assigned to the BES nodes, the number of disk units of theBES nodes, and the number of partitions of the disk units in accordancewith the upper limit number of pages.

According to a sixth aspect of the invention, there is provided adatabase divisional management method for use with a parallel databasesystem comprising an FES node for analyzing and optimizing user queriesand generating a processing procedure in response thereto, BES nodes foraccessing a database on the basis of the processing procedure generatedby the FES node, an IOS node having a storage medium (i.e., disk units)and capable of storing and managing the database in the storage medium,and a network for connecting the FES, BES and IOS nodes. This databasedivisional management method comprises the steps of: calculating theexpected degree of parallelism p according to the load pattern based onthe processing procedure; and determining the number of processors assigned to the FES node, the number of processors assigned to the BESnodes, the number of processors assigned to the IOS node, the number ofdisk units of the IOS node, and the number of partitions of the diskunits in accordance with the expected degree of parallelism p.

According to a seventh aspect of the invention, there is provided adatabase divisional management method for use with a parallel databasesystem comprising an FES node for analyzing and optimizing user queriesand generating a processing procedure in response thereto, BES nodeshaving a storage medium (i.e., disk units) in which to store and managea database and capable of accessing the database on the basis of theprocessing procedure generated by the FES node, and a network forconnecting the FES and BES nodes. This database divisional managementmethod comprises the steps of: calculating the expected degree ofparallelism p according to the load pattern based on the processingprocedure; and determining the number of processors assigned to the FESnode, the number of processors assigned to the BES nodes, the number ofdisk units of the BES nodes, and the number of partitions of the diskunits in accordance with the expected degree of parallelism p.

In another preferred structure according to the invention, the databasedivisional management method further comprises the steps of: calculatingthe optimum number of accessible pages m; calculating the number ofpages s (=m/p) in units of sub-key ranges if key range partitions exist;and having sub-key range partitions in units of s pages for insertingdata into a disk apparatus.

According to an eighth aspect of the invention, there is provided adatabase divisional management method for use with a parallel databasesystem comprising an FES node for analyzing and optimizing user queriesand generating a processing procedure in response thereto, BES nodes foraccessing a database on the basis of the processing procedure generatedby the FES node, an IOS node having a storage medium (i.e., disk units)and capable of storing and managing the database in the storage medium,and a network for connecting the FES, BES and IOS nodes. This databasedivisional management method comprises the steps of: detecting a loadunbalance on the basis of such load information items as the number ofaccessed pages, the number of hit rows and the number of communicationsacquired during execution of the processing procedure; and changing thenumber of processors assigned to the FES node, the number of processorsassigned to the BES nodes, the number of processors assigned to the IOSnode, and the number of disk units of the IOS node so as to eliminatethe load unbalance.

According to a ninth aspect of the invention, there is provided adatabase divisional management method for use with a parallel databasesystem comprising an FES node for analyzing and optimizing user queriesand generating a processing procedure in response thereto, BES nodeshaving a storage medium (i.e., disk units) in which to store and managea database and capable of accessing the database on the basis of theprocessing procedure generated by the FES node, and a network forconnecting the FES and BES nodes. The database divisional managementmethod comprises the steps of: detecting a load unbalance on the basisof such load information items as the number of accessed pages, thenumber of hit rows and the number of communications acquired duringexecution of the processing procedure; and changing the number ofprocessors assigned to the FES node, the number of processors assignedto the BES nodes, and the number of disk units of the BES nodes so as toeliminate the load unbalance.

In a further preferred structure according to the invention, thedatabase divisional management method further comprises the steps of:closing, when online processing is in progress, the key range of adatabase table if at least one of the three numbers consisting of thenumber of processors assigned to the BES nodes, the number of processorsassigned to the IOS node and the number of disk units is to beincreased, the database table being the object to be managed by eitherthe processors or the disk units to be added; assigning the processorsand the disk units anew; succeeding lock information and directoryinformation; updating the dictionary information necessary for nodeassignment control; and releasing the closing of the key rangethereafter if the online processing is still in progress.

In an even further preferred structure according to the invention, thedatabase divisional management method further comprises the steps of:closing, when online processing is in progress, the key range of adatabase table if either the number of processors assigned to the BESnodes or the number of disk units is to be increased, the database tablebeing the object to be managed by either the processors or the diskunits to be added; assigning either the processors or the disk unitsanew; succeeding lock information and directory information; updatingthe dictionary information necessary for node assignment control; movingdata from the existing group of disk units to the newly added diskunits; and releasing the closing of the key range thereafter if theonline processing is still in progress.

In a still further preferred structure according to the invention, thedatabase divisional management method comprises the steps of: closing,when online processing is in progress, the key range of a database tableif at least one of the three numbers consisting of the number ofprocessors assigned to the BES nodes, the number of processors assignedto the IOS node and the number of disk units is to be decreased, thedatabase table being managed by either the processors or the disk unitsto be removed; determining either the processors or the disk units to beremoved; succeeding lock information and directory information; updatingthe dictionary information necessary for node assignment control; andreleasing the closing of the key range thereafter if the onlineprocessing is still in progress.

In another preferred structure according to the invention, the databasedivisional management method further comprises the steps of: closing,when online processing is in progress, the key range of a database tableif at least either the number of processors assigned to the BES nodes orthe number of disk units is to be decreased, the database table beingmanaged by either the processors or the disk units to be removed;determining either the processors or the disk units to be removed;succeeding lock information and directory information; updating thedictionary information necessary for node assignment control; movingdata from the disk units to be removed to the disk units succeedingthose units to be removed; and releasing the closing of the key rangethereafter if the online processing is still in progress.

In a further preferred structure according to the invention, either thenumber of processors or the number of disk units for database processingis changed dynamically.

In operation, the invention of the aspects and preferred structuresoutlined above provides the following major functions and features:

The database divisional management method according to the second aspectof the invention determines the number of processors assigned to each ofthe configured nodes, the number of disk units, and the number ofpartitions of the disk units in accordance with the load pattern fordatabase processing (single item search, single item update, dataretrieval, etc.). The invention embodied in this structure provides asystem configuration suitable for dealing with the load in question,offers the expected degree of parallelism and permits high-speed queryprocessing. The database divisional management method according to thefourth aspect of the invention determines the number of processorsassigned to each of the configured nodes, the number of disk units, andthe number of partitions of the disk units in accordance with the upperlimit number of pages which are accessible in parallel and which requirea constant time each when the database is scanned for access thereto.The invention embodied in this structure realizes high-speed queryprocessing.

The database divisional management method according to the sixth aspectof the invention determines the number of processors assigned to each ofthe configured nodes, the number of disk units, and the number ofpartitions of the disk units in accordance with the expected degree ofparallelism p according to the load pattern. The invention embodied inthis structure provides the expected degree of parallelism.

The database divisional management method of a preferred structurecalculates the number of pages s in units of sub-key ranges using theoptimum number of accessible pages m and the expected degree ofparallelism p (s=m/p), and gets sub-key range partitions in units of spages for inserting data into a disk apparatus. The invention embodiedin this alternative structure allows data to be managed in substantiallyequal partitions.

The database divisional management method according to the eighth aspectof the invention detects a load unbalance, and changes the number ofprocessors assigned to each of the configured nodes or the number ofdisk units so as to eliminate the detected load unbalance. The inventionembodied in this structure keeps the system always configured to allowfor the load fluctuation, provides the expected degree of parallelismand permits high-speed query processing.

When online processing is in progress, the database divisionalmanagement method of another alternative structure closes the key rangeof the relevant database table if either the number of processorsassigned to each of the configured nodes or the number of disk units isto be increased or decreased. After this, necessary information issucceeded and the closing of the key range is released. This structureminimizes the overhead and, in a system configuration comprising the IOSnode, allows the data to be succeeded without moving them across thedisk units.

The database divisional management method of yet another preferredstructure allows either the number of processors or the number of diskunits for database processing to be changed dynamically. This featurereadily provides for a scalable parallel database system.

The foregoing and other objects, advantages, manner of operation andnovel features of the present invention will be understood from thefollowing detailed description when read in connection with theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a parallel database system embodying theinvention;

FIG. 2 is a conceptual view of a database divisional management methodembodying the invention;

FIG. 3 is a conceptual view of optimal node distribution (where an IOSexists) by the inventive database divisional management method;

FIG. 4 is a conceptual view of optimal node distribution (where an IOSdoes not exist) by the inventive database divisional management method;

FIG. 5 is a block diagram of an FES;

FIG. 6 is a block diagram of a BES;

FIG. 7 is a block diagram of an IOS;

FIG. 8 is a block diagram of a DS;

FIG. 9 is a block diagram of a JS;

FIG. 10 is a flowchart of steps performed by a system controller;

FIG. 11 is a flowchart of steps representing a query analysis process;

FIG. 12 is a flowchart of steps representing query analysis;

FIG. 13 is a flowchart of steps representing static optimizationprocessing;

FIG. 14 is a flowchart of steps representing predicate selectivityestimation processing;

FIG. 15 is a flowchart of steps representing access path pruning;

FIG. 16 is a flowchart of steps representing processing procedurecandidate generation;

FIG. 17 is a flowchart of steps representing code generation processing;

FIG. 18 is a flowchart of steps representing query execution processing;

FIG. 19 is a flowchart of steps representing dynamic optimizationprocessing;

FIG. 20 is a flowchart of steps representing code interpretationprocessing;

FIG. 21 is a flowchart of steps representing data load processing;

FIG. 22 is a flowchart of steps representing dynamic load controlprocessing; and

FIG. 23 is a conceptual view of dynamic load control.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Preferred embodiments of the invention will now be described withreference to the accompanying drawings. It should be noted that suchdescription is for illustrative purposes only and is not limitative ofthe invention.

FIG. 1 is a block diagram of a parallel database system 1 embodying theinvention. The parallel database system 1 comprises an FES (front endserver) node, a BES (back end server) node, an IOS (input/output server)node, a DS (dictionary server) node and a JS (journal server) node, allconnected by a network 90. Each of the configured nodes is alsoconnected to another system.

The FES node 75 is composed of at least one processor having no diskunits. This node has front end server functions for analyzing andoptimizing user queries and for generating a processing procedure inresponse thereto.

The BES node 73 comprises at least one processor with no disk units.This node is capable of accessing a database by use of the processingprocedure generated by the FES 75.

The IOS node 70 includes at least one processor and is equipped with atleast one disk unit 80. The disk unit 80 accommodates the database sothat the latter may be managed therein.

If the BES node 73 is supplemented by the functions of the IOS nodes,the IOS node may be omitted. In such a case, the disk unit 80 isconnected to the BES node 73 which now has functions for storing andmanaging the database in the connected disk unit 80.

The database is composed of a plurality of tables. Each table is atwo-dimensional table comprising a plurality of rows. One row includesat least one column (i.e., attribute). These tables, stored in the diskunit 80, are each divided physically into fixed-length pages made of apredetermined number of rows each. The location at which each of thepages is stored in the disk unit 80 is known by referencing directoryinformation.

The DS node 71 comprises at least one processor and has at least onedisk unit 81. This node is capable of managing database definitioninformation as a whole.

The JS node 72 includes at least one processor and has at least one diskunit 82. This node has functions for storing and managing the historyinformation on database updates carried out by each node.

FIG. 2 is a conceptual view of a database divisional management methodembodying the invention. This method determines the number of processorsassigned to each of the FES node 75, BES node 73 and IOS node 70 in FIG.1, the number of configured disk units, and the number of partitions ofthe disk units.

First to be determined is the load pattern of database processing on thebasis of the use-designated work load. The load pattern is any one of adiverse kinds of processing including single item search, single itemupdate and data retrieval. With the load pattern established, the IOSnode 70 determines accordingly the number of partitions of the diskunits 80 for management. (If the BES node 73 incorporates IOS nodefunctions, the BES node 73 determines the number of partitions of thedisk units for management.) Specifically, at the time of schemadefinition, the number of necessary disk units is determined by the waythe database tables are partitioned (in terms of key range, number ofrows per range converted from the number of pages, etc.). When the unitin which the key range is closed or reconfigured is determined, thecombination of BES and IOS units 73 and 70 is determined (the unit inwhich to close or reconfigure the key range is dependent on theconfiguration made of disk units and IOS and BES nodes). In the mannerdescribed, the configuration of the BES and IOS units 73 and 70 and ofthe disk units 80 is determined as follows:

-   -   Partition count: unit number of database partitions that may be        accessed in parallel under management of all BES nodes    -   Disk count per partition: number of disk units assigned to each        partition

FIG. 2 shows an example in which the disk count is 8, the partitioncount is 4 and the disk count per partition is 2.

If the processor performance is enhanced by a factor of n, then thenumber of volumes for use by each partition is multiplied by n. (Itshould be noted that the number of disk units is limited because of theconstraints on the overall data transfer rate between the IOS node 70and the disk units 80.)

Although one disk unit represents one disk drive in the above example,the one-to-one correspondence between a disk drive and a disk unit isnot mandatory for this invention. One disk unit may illustrativelycontain a plurality of disk drives (e.g., disk array apparatus). In thatcase, the unit number of I/O units that may be accessed in parallel maybe regarded as a disk unit.

In the example of FIG. 2 where the normal configuration is composed ofone FES node, four BES nodes, one IOS node and eight disk units, thereneed only be one FES node and one BES node at initial data load time.That is, the initial configuration is made of one FES node, one BESnode, one IOS node and eight disk units. Thus a BES node 731 hasdirectory information about the database stored in disk units 811-842constituting partitions #1-#4.

Where the BES load is so low that one BES node 731 is enough to dealwith the IOS 70 and eight disk units 811-842, the single BES node 731alone accesses the database stored in the eight disk units 811-842. Inthat case, the configuration remains composed of one FES node, one BESnode, one IOS node and eight disk units.

If the load on the BES node 731 is on the increase and its activityratio stays at 100%, a load unbalance may eventually be detected. Inthat case, another BES node 732 is added to the configuration. Since thepartition count is 4, the two BES nodes 731 and 732 are assigned twopartitions each. The BES node 731 thus has directory information aboutthe database stored in the disk units 811-822 constituting partitions#1-#2; the BES node 732 has directory information about the databasestored in the disk units 831-842 constituting partitions #3-#4. Theresulting configuration is composed of one FES node, two BES nodes, oneIOS node and eight disk units.

If the load on the BES nodes 731 and 732 is still on the increase andtheir activity ratio stays at 100%, a load unbalance may also bedetected. In that case, the BES nodes 731 and 732 are supplementedrespectively by BES nodes 733 and 734. Since the partition count is 4,the four BES nodes 731, 732, 733 and 734 are assigned one partitioneach. The BES node 731 thus has directory information about the databasestored in the disk units 811-812 constituting partition #1; the BES node732 has directory information about the database stored in the diskunits 821-822 constituting partition #2; the BES node 733 has directoryinformation about the database stored in the disk units 831-832constituting partition #3; and the BES node 734 has directoryinformation about the database stored in the disk units 841-842constituting partition #4. The resulting configuration is composed ofone FES node, four BES nodes, one IOS node and eight disk units.

When the load is on the decrease and the activity ratio of the BES nodes733 and 734 drops illustratively below the 50% benchmark and remainsthere, it then becomes more efficient for the BES nodes 733 and 734 tobe assigned to other tasks. The BES nodes 733 and 734 whose activityratio is below 50% are thus reconfigured. The reduced configurationcomprises one FES node, two BES nodes, one IOS node and eight diskunits.

As described, where the number of BES nodes is increased or reduceddepending on the amount of load, a scalable system is implemented withina range defined by two scales of configuration: one FES node, one BESnode, one IOS node and eight disk units on the one hand; and one FESnode, four BES nodes, one IOS node and eight disk units on the other.

The IOS node 70 need only address parallel tasks as many as parallellyaccessible disk units regardless of the correspondence between the BESnodes 73 and the disk units 80. This makes it possible to change thecorrespondence between the BES nodes 73 and the disk units 80 by movingthe directory information across the BES nodes without recourse to datamovement. Separating and integrating accesses is thus made easier.

Below is a description of how two representative load patterns of singleitem update and data retrieval are processed. The processing isdescribed in terms of the number of steps involved, the number ofprocessors, the number of disk units and the number of partitions of thedisk units. The preconditions for the processing are assumed as follows;

FES processing (data reception): 30 (K steps)

BES processing (single item update): 60 (K steps)

BES processing (data retrieval): 220 (K steps)

Transmission: 6 (K steps)

Reception: 6+4*page count (K steps)

Issuance of input/output request: 4+4*page count (K steps)

Processor performance: 10 (M steps per second)

Input/output performance (one-page access): 20 (msec)

Input/output performance (10-page access): 30 (msec)

A. Single Item Update (One-Page Access)

(1) With System Configuration Comprising IOS Node

-   -   Dividing the processor performance (10 M steps per second) by        the FES processing step count (30 K steps) provides an available        reception count-of up to 333 times per second.

The necessary step count for single item update by a BES node is givenas: reception of an execution request from the FES node (6 Ksteps)+transmission of a data retrieval request from the BES node (6 Ksteps)+reception of retrieved data from IOS node (10 K steps)+singleitem update (60 K steps)+transmission of the result of the executionrequest to FES node (6 K steps)=88 (K steps). Dividing the processorperformance (10 M steps per second) by the obtained single item updatestep count (88 K steps) for the BES node provides an available singleitem update count of up to 114 times per second.

In addition, the necessary step count for the IOS node to access diskunits is given as: reception of an input/output request from the BESnode (6 K steps)+issuance of the input/output request (8 Ksteps)+transmission of the result of the input/output request to the BESnode (6 K steps)=20 (K steps). Dividing the processor performance (10 Msteps per second) by the obtained disk access step count of 20 (K steps)for the IOS node provides an available disk access count of up to 500times per second.

Because it takes 20 msec to perform random input/output of one page, onedisk unit may be accessed up to 50 times per second. When the maximumavailable disk access count of 500 times per second for the IOS node isdivided by the single disk access count of 50 times per second, theresult indicates that up to 10 disk units may be connected to the IOSnode.

When the maximum available disk access count of 500 times per second forthe IOS node is divided by the single item update count of 114 times persecond for the BES node, the result indicates that one IOS node mayaddress up to 4.3 BES nodes.

When the maximum available reception count of up to 333 times per secondfor the FES node is divided by the single item update count of 114 timesper second for the BES node, the result indicates that one FES node mayaddress up to three BES nodes.

The results above are summarized into ratios of FES:BES=1:3,BES:IOS=4.3:1, and IOS:disk units=1:10. Putting these ratios togetherprovides a substantially balanced configuration composed of one FESnode, four BES nodes, one IOS node and eight disk units, as shown inFIG. 3 (with a minor imbalance regarding the FES node and the diskunits).

(2) With System Configuration where BES Nodes Furnish IOS Node Functions

-   -   Dividing the processor performance (10 M steps per second) by        the FES processing step count (30 K steps) provides an available        reception count of up to 333 times per second.

The necessary step count for single item update by a BES node is givenas; reception of an execution request from the FES node (6 Ksteps)+issuance of an input/output request (8 K steps)+single itemupdate (60 K steps)+transmission of the result of the execution requestto FES node (6 K steps)=80 (K steps). Dividing the processor performance(10 M steps per second) by the obtained single item update step count(80 K steps) provides an available single item update count of up to 125times per second.

Because it takes 20 msec to perform random input/output of one page, onedisk unit may be accessed up to 50 times per second. When the maximumavailable single item update count of 125 times per second for the BESnode is divided by the single disk access count of 50 times per second,the result indicates that up to 2.5 disk units may be connected to theBES node.

When the maximum available reception count of up to 333 times per secondfor the FES node is divided by the single item update count of 125 timesper second for the BES node, the result indicates that one FES node mayaddress up to 2.6 BES nodes.

The results above are summarized into ratios of FES:BES=1:2.6 andBES:disk units=1:2.5. Putting these ratios together provides asubstantially balanced configuration composed of one FES node, four BESnodes and eight disk units, as shown in FIG. 4 (with a minor imbalanceregarding the FES node).

B. Data Retrieval (10-Page Access)

(1) With System Configuration Comprising IOS Node

-   -   Dividing the processor performance (10 M steps per second) by        the FES processing step count (30 K steps) provides an available        reception count of up to 333 times per second.

The necessary step count for data retrieval by a BES node is given as;reception of an execution request from the FES node (6 Ksteps)+transmission of a data retrieval request from the BES node (6 Ksteps)+reception of retrieved data from IOS node (46 K steps)+retrievalof data (220 K steps)+transmission of the result of the executionrequest to FES node (6 K steps)=284 (K steps). Dividing the processorperformance (10 M steps per second) by the obtained data retrieval stepcount (284 K steps) provides an available data retrieval count of up to35 times per second.

In addition, the necessary step count for the IOS node to access diskunits is given as: reception of an input/output request from the BESnode (6 K steps)+issuance of the input/output request (44 Ksteps)+transmission of the result of the input/output request to the BESnode (6 K steps)=56 (K steps). Dividing the processor performance (10 Msteps per second) by the obtained step count provides an available diskaccess count of up to 179 times per second.

Because it takes 30 msec to perform batch input/output of 10 pages, onedisk unit may be accessed up to 33 times per second. When the maximumavailable disk access count of 179 times per second for the IOS node isdivided by the single disk access count of 33 times per second, theresult indicates that up to 5.4 disk units may be connected to the IOSnode.

When the maximum available disk access count of 179 times per second forthe IOS node is divided by the available data retrieval count of 35times per second for the BES node, the result indicates that one IOSnode may address up to 5.1 BES nodes.

In addition, when the maximum available reception count of 333 times persecond for the FES node is divided by the available data retrieval countof 35 times per second for the BES node, the result indicates that oneFES node may address up to 9.5 BES nodes.

The results above are summarized into ratios of FES:BES=1:9.5,BES:IOS=5.1:1, and IOS:disk units=1:5.4. Putting these ratios togetherprovides a substantially balanced configuration composed of one FESnode, 10 BES nodes, two IOS nodes and 10 disk units (with a minorimbalance regarding the disk units).

(2) With System Configuration where BES Nodes Furnish IOS Node Functions

Dividing the processor performance (10 M steps per second) by the FESprocessing step count (30 K steps) provides an available reception countof up to 333 times per second.

The necessary step count for data retrieval by a BES node is given as:reception of an execution request from the FES node (6 K steps)+issuanceof an input/output request (44 K steps)+retrieval of data (220 Ksteps)+transmission of the result of the execution request to FES node(6 K steps)=276 (K steps). Dividing the processor performance (10 Msteps per second) by the obtained data retrieval step count (276 Ksteps) for the BES node provides an available data retrieval count of upto 36 times per second.

Because it takes 30 msec to perform batch input/output of 10 pages, onedisk unit may be accessed up to 33 times per second. When the maximumavailable data retrieval count of 36 times per second for the BES nodeis divided by the single disk access count of 33 times per second, theresult indicates that one disk unit may be connected to the BES node.

When the maximum available reception count of up to 333 times per secondfor the FES node is divided by the available data retrieval count of 36times per second for the BES node, the result indicates that one FESnode may address up to 9.2 BES nodes.

The results above are summarized into ratios of FES:BES=1:9.2 andBES:disk unit=1:1.1 Putting these ratios together provides asubstantially balanced configuration composed of one FES node, 10 BESnodes and 10 disk units.

FIG. 5 is a block diagram of the FES node 75. The FES node 75 comprisesuser-generated application programs 10-11, a parallel databasemanagement system 20 for providing overall database system managementsuch as query processing and resource management, and an operatingsystem 30 for managing all computer system operations including thereading and writing of data.

The parallel database management system 20 has a system controller 21, alogical processor 22, a physical processor 23, and a database dictionarybuffer 24 for temporarily accommodating target data. The paralleldatabase management system 20 is connected to the network 90 as well asto another parallel database management system.

The system controller 21 primarily manages input/output and otheroperations. The system controller 21 has a data load processing program210 and a dynamic load control processing program 211.

The logical processor 22 includes a query analysis program 220 foranalyzing queries in terms of syntax and semantics, a staticoptimization processing program 221 for generating appropriateprocessing procedure candidates, and a code generator 222 for generatingthe code applicable to each processing procedure candidate generated.The logical processor 22 further includes a dynamic optimizationprocessing program 223 for selecting an optimal processing procedure anda code interpreter 224 for interpreting the code of the selectedprocessing procedure candidate.

The physical processor 23 comprises a data access processing program 230that edits accessed data and judges them for conditions, and addsrecords; a database dictionary buffer controller 231 that controls thereading and writing of database records; and a concurrency controller233 that provides concurrency control over the resources shared by thesystems.

FIG. 6 is a block diagram of the BES node 73. The BES node 73 iscomposed of the parallel database management system 20 for overalldatabase system management and the operating system 30 for overallcomputer system management. Where the BES node 73 is equipped with IOSnode functions, the BES node 73 has disk units in which to store andmanage a database 40.

The parallel database management system 20 comprises the systemcontroller 21, the logical processor 22, the physical processor 23, anda database buffer 24 for temporarily accommodating target data. Thisparallel database management system 20 is connected to the network 90and to another parallel database management system 20.

The system controller 21 manages input/output and other operations. Thesystem controller 21 includes the data load processing program 210 thatloads data by taking load distribution into consideration.

The logical processor 22 comprises the code interpreter 224 that carriesout code interpretation.

The physical processor 23 includes the data access processing program230 that edits accessed data and judges them for conditions, and addsrecords; the database buffer controller 231 that controls the readingand writing of database records; a mapping processing program 232 thatmanages the locations at which the target data to be input or output arestored; and the concurrency controller 233 that provides concurrencycontrol over the resources shared by the systems.

FIG. 7 is a block diagram of the IOS node 70 together with the disk unit80. The IOS node 70 comprises the parallel database management system 20for overall database system management and the operating system 30 foroverall computer system management. The disk unit 80 contains thedatabase 40.

The parallel database management system 20 comprises the systemcontroller 21, the physical processor 23, and an input/output buffer 24for temporarily accommodating target data. This parallel databasemanagement system 20 is connected to the network 90 as well as toanother parallel database management system.

The system controller 21 manages input/output and other operations. Thesystem controller 21 includes the data load processing program 210 thatloads data by taking load distribution into consideration.

The physical processor 23 comprises the data access processing program230 that edits accessed data and judges them for conditions, and addsrecords; and an input/output buffer controller 231 that controls thereading and writing of database records.

FIG. 8 is a block diagram of the DS 71 together with the disk unit 81.The DS 71 comprises the parallel database management system 20 foroverall database system management and the operating system 30 foroverall computer system management. The disk unit 81 contains adictionary 50.

The parallel database management system 20 comprises the systemcontroller 21, the logical processor 22, the physical processor 23, anda dictionary buffer 24. This parallel database management system 20 isconnected to the network 90 as well as to another parallel databasemanagement system.

The logical processor 22 includes the code interpreter 224 that carriesout code interpretation.

The physical processor 23 includes the data access processing program230 that edits accessed data and judges them for conditions, and addsrecords; the dictionary buffer controller 231 that controls the readingand writing of dictionary records; and the concurrency controller 233that provides concurrency control over the resources shared by thesystems.

FIG. 9 is a block diagram of the JS 72 together with the disk unit 82.The JS 72 comprises the parallel database management system 20 foroverall database system management and the operating system 30 foroverall computer system management. The disk unit 82 contains a journal60.

The parallel database management system 20 comprises the systemcontroller 21, the physical processor 23, and a journal buffer 24. Thisparallel database management system 20 is connected to the network 90 aswell as to another parallel database management system.

The physical processor 23 includes the data access processing program230 that edits accessed data and judges them for conditions, and addsrecords; and a journal buffer controller 231 that controls the readingand writing of journal records.

FIG. 10 is a flowchart of steps performed by the system controller ofthe parallel database management system 20 in the FES node 75. Thesystem controller 21 first checks to see if a query analysis process isselected (step 212). If the query analysis process is found to beselected, the system controller 21 calls and executes a call queryanalysis program 400. After execution of the call query analysis program400, the system controller 21 ends its operation.

If the query analysis process is not in effect in step 212, the systemcontroller 21 checks to see if a query execution process is selected(step 213). If the query execution process is found to be selected, thesystem controller 21 calls and executes a query execution program 410.After execution of the query execution program 410, the systemcontroller 21 ends its operation.

If the query execution process is not in effect in step 213, the systemcontroller 21 checks to see if a data load process is selected (step214). If the data load process is found to be selected, the systemcontroller 21 calls and executes a data load program 210. Afterexecution of the data load program 210, the system controller 21 endsits operation.

If the data load process is not in effect in step 214, the systemcontroller 21 checks to see if a dynamic load control process isselected (step 215). If the dynamic load control program is found to beselected, the system controller 21 calls and executes a dynamic loadcontrol program 211. After execution of the dynamic load control program211, the system controller 21 end its operation.

If the dynamic load control process is not in effect in step 215, thenthe system controller 21 terminates its operation.

The flowchart of steps performed by the database management system 20 ofthe BES node 73 is that of FIG. 10 minus steps 212, 215, 400 and 211.The flowchart of steps carried out by the database management system 20of the IOS node 70 is that of FIG. 10 minus steps 212, 213, 215, 400,410 and 211.

FIG. 11 is a flowchart of steps performed by the query analysis program400. In step 220 for query analysis, the program 400 analyzes the inputquery sentence for syntax and semantics.

In step 221 for static optimization, the query analysis program 400estimates the ratio of the data that would satisfy the conditionoccurring in the query. Then based on predetermined rules, the program400 generates effective access path candidates (especially indices) soas to prepare a processing procedure candidate.

In step 222 for code generation, the query analysis program 400translates the processing procedure candidate into an executable formcode. This terminates the processing of the query analysis program 400.

FIG. 12 is a flowchart of steps representing the query analysis 220. Instep 2200, the input query sentence is analyzed for syntax andsemantics. Then the current processing is terminated.

FIG. 13 is a flowchart of steps representing the static optimizationprocess 221. In step 2210 for predicate selectivity estimation, theselectivity of the predicate of the condition occurring in the query isestimated.

In step 2211 for access path pruning, the access paths including indicesare pruned.

In step 2212 for processing procedure candidate generation, theprocessing procedure candidate combining the access paths is generated.This terminates the static optimization processing.

FIG. 14 is a flowchart of steps representing the predicate selectivityestimation process 2210. In step 22101, a check is made to see if anyvariable appears in the query condition. If no variable appears in thequery condition, step 22102 is reached. If a variable appears in thequery condition, step 22104 is reached.

In step 22102, a check is made to see if the query condition containscolumn value frequency information. If the column value frequencyinformation is present, step 22103 is reached. If the column valuefrequency information is not found, step 22105 is reached.

In step 22103, the selectivity is calculated by use of the column valuefrequency information, and the current processing is terminated.

In step 22104, a check is made to see if the query condition containscolumn value frequency information. If the column value frequencyinformation is present, the current processing is terminated; if no suchinformation exists, step 22105 is reached.

In step 22105, a default value is set in accordance with the kind of thequery condition. The current processing is then terminated.

FIG. 15 is a flowchart of steps representing the access path pruningprocess 2212. In step 22120, column indices appearing in the conditionare set as access path candidates.

In step 22121, a check is made to see if the table to be accessed forthe query is stored separately in a plurality of nodes. If the table isnot stored separately, step 22122 is reached; if the table is storedseparately, step 22123 is reached.

In step 22122, sequential scans are set as access path candidates.

In step 22123, parallel scans are set as access path candidates.

In step 22124, a check is made to see if the selectivity of eachcondition is already set. If the selectivity is found to be already set,step 22125 is reached; if the selectivity has yet to be set, step 22126is reached.

In step 22125, the highest access path priority is given to the index ofthe condition whose selectivity is the smallest regarding each table.

In step 22126, the maximum and minimum values of the selectivity of eachcondition are obtained.

In step 22127, the reference for selecting each access path iscalculated from system characteristics such as the processor performanceand I/O performance.

In step 22128, only those access paths of combined single or pluralindices whose selectivity is less than the above reference are set asaccess path candidates.

FIG. 16 is a flowchart of steps representing the processing procedurecandidate generation process 2213. In step 22130, a check is made to seeif the table to be accessed for the query is stored separately in aplurality of nodes. If the table is not stored separately, step 22131 isreached; if the table is stored separately, step 22135 is reached.

In step 22131, a check is made to see if a sorting process is includedin the processing procedure candidate. If the sorting process is notincluded, step 22132 is reached; if the sorting process is-included,step 22135 is reached.

In step 22132, a check is made to see if there is only one access pathof the table to be accessed. If only one access path exists, step 22133is reached; if more than one access path is present, step 22134 isreached.

In step 22133, a single processing candidate is generated, and thecurrent processing is terminated.

In step 22134, a plurality of processing candidates are generated, andthe current processing is terminated.

In step 22135, the query is broken up into joinable two-way joins.

In step 22136, data read/data distribution processing procedures andslot sort processing procedures are set as candidates in correspondencewith the stored nodes of the table stored separately.

In step 22137, slot sort processing procedures, N-way merge processingprocedures and join processing procedures are set as candidates incorrespondence with the join processing nodes. The slot sort processingrefers to an intra-page sorting process in which the page foraccommodating data has each of its rows managed by a slot that is offsetfrom the beginning of the page. When the slots are read consecutively,the corresponding rows are accessed in ascending order. The N-way mergeprocessing refers to a process in which an N-way buffer is used to inputN runs at each merge stage for sorting by tournament into a single sortrun.

In step 22138, a requested data output processing procedure is set tothe request data output node.

In step 22139, a check is made to see if estimations are finished on alltables. If not all estimations are finished, step 22136 is reachedagain; if all estimations are found to be finished, the currentprocessing is terminated.

FIG. 17 is a flowchart of steps carried out by the code generator 222.In step 2220, a check is made to see if there is only one processingprocedure candidate. If more than one processing procedure candidate isfound, step 2221 is reached; if there exists only one processingprocedure candidate, step 2223 is reached.

In step 2221, optimization information composed of column valuefrequency information and others is embedded into the processingprocedures.

In step 2222, the data structure for selecting processing procedures isgenerated on the basis of the constants substituted upon queryexecution.

In step 2223, the processing procedure is translated into executionformulae. The processing is then terminated.

FIG. 18 is a flowchart of steps performed by the query execution program410. In step 223 for dynamic optimization processing, the processingprocedure to be executed on each node group is determined on the basisof the substituted constants.

In step 224 for code interpretation, the processing procedure isinterpreted and executed. The current processing is then terminated.

FIG. 19 is a flowchart of steps performed by the dynamic optimizationprocessing program 223. In step 22300, a dynamic load control program iscalled and executed.

In step 22301, a check is made to see if there is only one processingprocedure being generated. If there is only one processing procedurebeing generated, the current processing is terminated. If there is morethan one processing procedure being generated, step 22302 is reached.

In step 22302, the selectivity is calculated on the basis of thesubstituted constants.

In step 22303, a check is made to see if parallel processing procedurecandidates are included in the processing procedure candidates. Ifparallel processing procedure candidates are found to be included, step22304 is reached; if no such candidates are found, step 22308 isreached.

In step 22304, the optimization information (column value frequencyinformation of join columns, number of rows in the table to be accessed,page count, etc.) is input from the dictionary.

In step 22305, the processing time for data retrieval and datadistribution is calculated in consideration of various systemcharacteristics.

In step 22306, the assigning number p of join nodes is determined basedon the processing time.

In step 22307, data distribution information is generated on the basisof the optimization information.

In step 22308, the processing procedure is selected by use of the accesspath selection reference.

FIG. 20 is a flowchart of steps carried out by the code interpreter 224.In step 22400, a check is made to see if data retrieval and a datadistribution process are in effect. If data retrieval and the datadistribution process are found to be in effect, step 22401 is reached.If data retrieval and the data distribution process are not found to bein effect, step 22405 is reached. In step 22401, the database isaccessed and the condition is evaluated.

In step 22402, data is set into a buffer of each node based on the datadistribution information.

In step 22403, a check is made to see if the buffer of the node inquestion is full. If the buffer is found to be full, step 22404 isreached; if the buffer is not full, step 22420 is reached.

In step 22404, data is transferred in page form to the correspondingnode. Step 22404 is followed by step 22420.

In step 22405, a check is made to see if a slot sort process is ineffect. If the slot sort process is found to be in effect, step 22406 isreached; if the slot sort process is not found to be in effect, step22409 is reached.

In step 22406, page form data is received from another node.

In step 22407, the slot sort process is executed.

In step 22408, the result of the slot sort process is storedtemporarily. Step 22408 is followed by step 22420.

In step 22409, a check is made to see if an N-way merge process is ineffect. If the N-way merge process is found to be in effect, step 22410is reached; if the N-way merge process is not found to be in effect,step 22412 is reached.

In step 22410, the N-way merge process is executed.

In step 22411, the result of the N-way merge process is storedtemporarily. Step 22411 is followed by step 22420.

In step 22412, a check is made to see if a join process is in effect. Ifthe join process is found to be in effect, step 22413 is reached; if thejoin process is not found to be in effect, step 22416 is reached.

In step 22413, both sort lists are joined and data is set to a bufferfor output.

In step 22414, a check is made to see if the buffer for output is full.If the buffer for output is found to be full, step 22415 is reached; ifthe buffer for output is not found to be full, step 22420 is reached.

In step 22415, data is transferred in page form to the request dataoutput node. Step 22415 is followed by step 22420.

In step 22416, a check is made to see if a request data output processis in effect. If the request data output process is found to be ineffect, step 22417 is reached. If the request data output process is notfound to be in effect, step 22420 is reached.

In step 22417, a check is made to see if page form data is transferredfrom another node. If page form data is found to be transferred fromanother node, step 22418 is reached; if no such data is found to betransferred, step 22419 is reached.

In step 22418, page form data is received from another node.

In step 22419, the result of the query processing is output to theapplication program.

In step 22420, a check is made to see if the BES node is in execution.If the BES node is found to be in execution, step 22421 is reached; ifthe BES node is not in execution, the current processing is terminated.

In step 22421, the FES is notified of that information for estimatingprocessing load which includes the access page count, the hit row countand the number of communications. The current processing is thenterminated.

FIG. 21 is a flowchart of steps performed by the data load program 210.Before each step of the data load process is explained, generalconcepts-of the process will be outlined below. The data load methodcomes in three kinds: one method distributes data with the emphasis on atarget response time, whereby the time required to scan the entire tableis limited below a predetermined level; another method distributes datawith the emphasis on degrees of parallelism, whereby m pages areaccessed for optimal parallel processing; yet another method distributesdata as desired under user control, whereby all volume partitions aredesignated by the user.

The distribution of data with the emphasis on the target response timeinvolves initially finding the number of pages in which to store therows of the entire table. Then the upper limit number of pages to bestored in the disk units partitioned for parallel access is determined.If necessary, batch input (e.g., of 10 pages) is presupposed for access.Load distribution is determined in view of the combination of the numberof disk units, the number of IOS nodes and the number of BES nodes. Ifthere exists a key range division, the divided key range is subdividedby the upper limit page count and data is stored into the subdivided keyranges of the disk units. More aspects of this process will be describedlater in detail with reference to FIG. 23.

The distribution of data with the emphasis on degrees of parallelism isdependent on the size m which is preferred to be considerably large. Ifthere is a key range division, the number of sub-key range-storing pagess for the divided key range is determined on the basis of the size m andof the expected degree of parallelism p (=m/p). Data is stored in unitsof s pages into the subdivided key ranges of the disk units.

The expected degree of parallelism p is calculated as the square root ofthe ratio given by dividing the response time by the overhead per node.When so obtained, the ratio of 10 corresponds to the expected degree ofparallelism of 3, 100 to 10, 1,000 to 32, and 10,000 to 100. If thecalculated degree of parallelism p is higher than the existing partitioncount, the existing partition count is selected (because it determinesthe maximum number of disk units that may be processed). Conversely, ifthe calculated expected degree of parallelism p is lower than theexisting partition count, the expected degree of parallelism p isselected with the existing partition count taken as the upper limit.

Specifically, suppose that the distribution of data optimized for100-page access is to be calculated. As a precondition, batch input isto be carried out in units of 10 pages. Because it takes 300 msec toperform one I/O operation (i.e., 10-page access) and 5.6 msec to carryout one I/O overhead operation (56 ks required for 10-MIPS performance),the expected degree of parallelism p is about 7 (=√{300/5.6}). Thus thekey range is subdivided into sub-key ranges in units of 14 pages(s=100/7).

The user-designated distribution of data is effected in the same manneras with conventional database management systems. That is, datadistribution is managed as set by parameters.

In step 21000 of FIG. 21, a check is made to see if data distributionputs the emphasis on the target response time. If the emphasis is notfound to be placed on the target response time, step 21001 is reached.If the emphasis is found to be placed on the tar get response time, step21003 is reached.

In step 21001, a check is made to see if data distribution puts theemphasis on the expected degree of parallelism. If the emphasis is notfound to be placed on the expected degree of parallelism, step 21002 isreached. If the emphasis is found to be placed on the expected degree ofparallelism, step 21010 is reached.

In step 21002, a check is made to see if a user's designation exists. Ifthe user's designation is found, step 21016 is reached. If the user'sdesignation is not found, the current processing is terminated.

In step 21003, the number of necessary pages for storing the rows of thewhole table is calculated.

Step 21004 decides the upper limit number of pages to be stored intodisk units which are parallelly accessible within a predetermined timethat is needed to scan the whole table.

In step 21005, the BES nodes, IOS nodes and disk units which satisfy thepreceding requirement are determined.

In step 21006, a check is made to see if there is a key range division.If there exists a key range division, step 21007 is reached. If there isno key range division, step 21009 is reached.

In step 21007, the divided key range is subdivided by an upper limitpage count.

In step 21008, data is inserted into the key range subdivisions.Thereafter, the current processing is terminated.

In step 21009, data is inserted as partitioned by the upper limit pagecount. Then the current processing is terminated.

In step 21010, the optimum page access count m is calculated on thebasis of the estimated work load.

In step 21011, the expected degree of parallelism p is calculated, andthe BES node, IOS node and disk units are determined accordingly.

In step 21012, a check is made to see if there exists a key rangedivision. If there is a key range division, step 21013 is reached. Ifthere is no key range division, step 21015 is reached.

In step 21013, the number of pages s to be stored in sub-key range unitsis calculated (s=m/p).

In step 21014, pages are divided in units of s pages into the sub-keyranges and data is inserted into each of the disk units. Then thecurrent processing is terminated.

In step 21015, data is inserted as partitioned in units of s pages toeach disk unit.

In step 21016, data is inserted to the disk units managed by theuser-designated IOS node. The current processing is then terminated.

FIG. 22 is a flowchart of steps performed by the dynamic load controlprogram 211. In step 21100, a check is made to see if there is a loadunbalance (access concentrated or discrete). Specifically, the congestedresources (processors (BES, IOS), disk units) are detected from thedatabase processing load executed in the unit time per node (i.e., theload is composed of the number of processing steps (for DB processing,I/O processing and communication processing), of the processorperformance (converted to processing time), and of the I/O operationcount (converted to I/O time)). The DB processing is then translatedinto SQL sentences and the status of access to each resource is sortedin units of tables. If a load unbalance is detected, step 21101 isreached; if no load unbalance is detected, the current processing isterminated.

In step 21101, a check is made based on the access distributioninformation to see whether it is necessary to add or delete BES nodes,IOS nodes and/or disk unit pairs. If such addition or deletion isnecessary, step 21102 is reached; if no such addition or deletion isnecessary, the current processing is terminated.

In step 21102, a check is made to see if any more server needs to beadded. If such addition is needed, step 21103 is reached; if no suchaddition is needed, step 21112 is reached.

In step 21103, a check is made to see if online processing is inprogress. If online processing is found to be in progress, step 21104 isreached. If online processing is not in effect, step 21105 is reached.

In step 21104, the key range of the table managed by the target BESnodes is closed.

In step 21105, new BES nodes are assigned.

In step 21106, lock information and directory information are succeeded.

In step 21107, the DS 71 is ordered to update the dictionary informationnecessary for node assignment control.

In step 21108, a check is made to see if an IOS node exists. If no IOSnode is found, step 21109 is reached. If an IOS node is found to exist,step 21110 is reached. This step is inserted here so that the samedynamic load control program may address two kinds of systemconfigurations: one wherein the IOS exists and the other in which no IOSexists.

In step 21109, data is transferred from the target BES nodes to the newBES nodes.

In step 21110, a check is made to see if online processing is inprogress. If online processing is found to be in progress, step 21111 isreached. If online processing is not in effect, the current processingis terminated.

In step 21111, the closing of the key range of the table managed by thetarget BES nodes is released. Then the current processing is terminated.

In step 21112, a check is made to see if online processing is inprogress. If online processing is found to be in progress, step 21113 isreached. If online processing is not in effect, step 21114 is reached.

In step 21113, the key range of the table managed by the target BESnodes is closed.

In step 21114, the BES nodes to be degenerated are determined.

In step 21115, the lock information and directory information aresucceeded.

In step 21116, the DS 71 is ordered to update the dictionary informationnecessary for node assignment control.

In step 21117, a check is made to see if an IOS node exists. If no IOSnode is found, step 21118 is reached. If an IOS node is found to exist,step 21119 is reached.

In step 21118, data is purged out of the BES nodes to be degenerated.

In step 21119, a check is made to see if online processing is inprogress. If online processing is found to be in progress, step 21120 isreached. If online processing is not in effect, the current processingis terminated.

In step 21120, the closing of the-key range of the table managed by thetarget BES nodes is released. Then the current processing is terminated.

FIG. 23 is a conceptual view of dynamic data load control using the keyrange division scheme. For this setup, it is assumed that the partitioncount is 4 and that the column values v1-v6 of the database take theoccurrence frequencies of FIG. 21.

At initial data load time, only one BES node 731 is needed. The upperlimit number of pages to be stored is made to correspond with each ofthe partitions 810-840 constituted by the disk units. In this setup, thecolumn values v1-v2 are stored in the disk units of the partition 810;the column values v2-v3 are stored in the disk units of the partitions820-830; the column values v3-v5 are stored in the disk units of thepartition 840; and the column values v5-v6 are stored in other diskunits. At initial data load time, the directory information is generatedfor each disk unit so as to manage the pages stored therein.

Where the BES nodes 732-734 are used selectively in accordance with theload, the directory information regarding each disk unit correspondingto the configured BES nodes is utilized for access to the database.

The processes described above may be implemented in other variations ofthe invention outlined below. In one variation, information about BESpositions is not included in row identifiers so as to facilitate thetransfer of rows between nodes. A BES node determines the physicalposition of a given row by combining the row identifier with thedirectory information for defining the divided positions of the table.Row transfer is accomplished by updating the directory information.There may be provided a structure capable of dealing with nodereconfiguration or row transfer, the structure allowing the processingto be partitioned through succession of the directory information andlock information even as BES nodes are added dynamically.

If it is desired to manage the database in a replica setup, the storagearea needs to be doubled. This means that the disk access load isapproximately doubled whether or not primary and backup copies aremanaged by the same IOS and BES nodes. Therefore the number of volumesfor each partition managed with the exist partition count need only behalved.

In case of a disk unit, IOS or BES failure, the failed unit or node isdisconnected from the online processing. The failed unit or node isrepaired and then reconnected to the online processing. The way in whichthe closing of the key range is managed varies with the node.Specifically, in case of a disk unit failure, the key range stored inthat disk unit is closed. If a backup copy exists (the backup copy needsto be acquired under management of the same IOS node (mirror disk unit)or of another IOS node (data replica)), the processing is reassigned. Incase of an IOS node failure, the key range stored in that IOS node isclosed. If a backup copy exists (the backup copy needs to be acquiredunder management of another IOS node (data replica)), the processing isreassigned. In case of a BES node failure, the key range managed by thatBES node is closed. If an IOS node exists, new BES nodes are assigned,lock information is succeeded, the dictionary information necessary fornode assignment control is updated, and the processing is allowed tocontinue.

This invention is not limited to systems wherein rules based onstatistical information and cost evaluations are used in combination.The invention may also be applied to database management systemsperforming optimization processing by use of cost evaluations alone, therules alone, or the combination of cost evaluations and the rules, aslong as processing procedures offering appropriate database referencecharacteristic information can be acquired thereby.

This invention may be practiced via a software system for a tightly orloosely coupled composite processor system in a mainframe computer. Itis also possible to practice the invention via a tightly or looselycoupled composite processor system using a dedicated processor for eachof the component processing programs. The invention may also be appliedto a single processor system if the system assigns parallel processes toeach processing procedure.

Furthermore, the invention may be applied to configurations wherein aplurality of processors share each of a plurality of disk units.

The database divisional management method of the invention renders thesystem configuration suitable for the load in question. This providesthe desired degree of parallelism and permits high-speed queryprocessing.

The parallel database system according to the invention is thus ascalable parallel database system capable of altering the systemconfiguration constantly in keeping with any fluctuation in load.

As many apparently different embodiments of this invention may be madewithout departing from the spirit and scope thereof, it is to beunderstood that the invention is not limited to the specific embodimentsthereof except as defined in the appended claims.

1. A database management method comprising the steps of: defining acorrespondence between a plurality of key ranges and a plurality of datastorage areas in a storage device; storing data in a data storage areawhich corresponds to a key range containing said data, when storing saiddata in a database; dividing said key range containing said data into aplurality of divided key ranges, when it is necessary to divide said keyrange containing said data; and defining a correspondence between thedivided key ranges and said plurality of data storage areas.
 2. Adatabase management method according to claim 1, further comprising thestep of: defining a correspondence of at least one of said divided keyranges to a data storage area whose correspondence with a key range isnot defined.
 3. A database management method according to claim 2,further comprising the step of: storing the data corresponding to saidat least one divided key range to the data storage area whosecorrespondence with said one divided key range is defined.
 4. A databasemanagement method according to claim 2, further comprising the step of:deleting data corresponding to said divided key range from said datastorage area corresponding to the key range necessary to be divided. 5.A database management method comprising the steps of: defining acorrespondence between a plurality of key ranges and a plurality of datastorage areas in a storage device; dividing at least one of saidplurality of key ranges, when necessary to divide; and defining acorrespondence between the divided key ranges and said plurality of datastorage areas.
 6. A database management method according to claim 5,further comprising the step of: defining a correspondence of at leastone of said divided key ranges to a data storage area whosecorrespondence with a key range is not defined.
 7. A database managementmethod comprising the steps of: defining a correspondence between aplurality of key ranges and a plurality of data storage areas in astorage device; dividing at least one of said plurality of key ranges,when a request is inputted to divide; and defining a correspondencebetween the divided key ranges and said plurality of data storage areas.8. A database management method according to claim 7, further comprisingthe step of: defining a correspondence of at least one of said dividedkey ranges to a data storage area whose correspondence with a key rangeis not defined.